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Abstract: The volumetric variability of dry tropical forests in Brazil and the scarcity of studies on the 
subject show the need for the development of techniques that make it possible to obtain adequate and 
accurate wood volume estimates. In this study, we analyzed a database of thinning trees from a forest 
management plan in the Contendas de Sincora National Forest, southwestern Bahia State, Brazil. The data 
set included a total of 300 trees with a trunk diameter ranging from 5 to 52 cm. Adjustments, validation and 
statistical selection of four volumetric models were performed. Due to the difference in height values for 
the same diameter and the low correlation between both variables, we do not suggest models which only use 
the diameter at breast height (DBH) variable as a predictor because they accommodate the largest 
estimation errors. In comparing the best single entry model (Hohenald-Krenn) with the Spurr model (best 
fit model), it is noted that the exclusion of height as a predictor causes the values of 136.44 and 0.93 for 
Akaike information criterion (AIC) and adjusted determination coefficient (Ria), which are poorer than the 
second best model (Schumacher-Hall). Regarding the minimum sample size, errors in estimation (root mean 
square error (RMSE) and bias) of the best model decrease as the sample size increases, especially when a 
larger number of trees with DBH215.0 cm are randomly sampled. Stratified sampling by diameter class 
produces smaller volume prediction errors than random sampling, especially when considering all trees. In 
summary, the Spurr and Schumacher-Hall models perform better. These models suggest that the total 
variance explained in the estimates is not less than 95%, producing reliable forecasts of the total volume 
with shell. Our estimates indicate that the bias around the average is not greater than 7%. Our results 
support the decision to use regression methods to build models and estimate their parameters, seeking 
stratification strategies in diameter classes for the sample trees. Volume estimates with valid confidence 
intervals can be obtained using the Spurr model for the studied dry forest. Stratified sampling of the data 
set for model adjustment and selection is necessary, since we find significant results with mean error square 
root values and bias of up to 70% of the total database. 
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1 Introduction 


The Caatinga is an exclusively Brazilian biome occupying approximately 7% of the national 
territory and constitutes one of the largest semi-arid areas in the world (Miles et al., 2006; Queiroz, 
2009; Araújo and Silva, 2010; Moro et al., 2016). The name Caatinga has indigenous origin 
(Tupi-Guarani) and its meaning (white forest, or mata branca in Portuguese) is associated with the 
open and gray aspect of the vegetation component (Moro et al., 2016), which is part of the group of 
seasonally dry tropical forests (Murphy and Lugo, 1986). These forests grow on soils ranging from 
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clayey and shallow to sandy and underlying (Costa et al., 2014), and are resistant to up to 10 
months of drought (Murphy and Lugo, 1995; Pennington et al., 2006; Dalmagro et al., 2014). 

Forest ecosystems with complex mosaic distributed vegetation and high species' endemism, such 
as Caatinga dry tropical forests, are among the most fragmented and endangered in the world (Silva 
and Bates 2002; Miles et al., 2006; Werneck, 2011). Thus, studies related to monitoring, detecting 
and mapping environmental changes, as well as to obtaining estimates of forest parameters become 
of great importance in this scenario, since they support sustainable forest management, aiming at 
the conservation and maintenance of the services provided. 

The wood volume present in a forest is a valuable information to elaborate sustainable forest 
management plans (SFMPs), as it is the basis for assessing the forest stock of a region and assists 
in planning forest exploitation (Leite and Resende, 2010). In Brazil, the Brazilian Institute of 
Environment and Renewable Natural Resources (IBAMA) requires that the volume of standing 
trees presented in SFMPs should be estimated using volumetric equations (Brasil, 2006). 

Volumetric equations are usually obtained by adjusting mathematical models using regression 
techniques, and constitute the most adopted and efficient procedure for quantifying forest 
production (Cabacinha et al., 2013; Silva-Ribeiro et al., 2014). Many allometric equations have 
been developed for various vegetation types, but they are rarely validated in the field, especially for 
seasonally dry tropical forests such as Caatinga (Sampaio et al., 2010). 

Studies related to volumetric modeling of dry forest phytophysiognomy trees are rare in the 
literature, despite its relevance as a shelter of remarkable biodiversity of fauna and flora 
(Albuquerque et al., 2012). Some authors also reported the difficulty of obtaining good allometric 
adjustments of individual trees of the predominant species in these phytophysiognomies, as they 
usually present tortuous trunks which fork near the ground (Lima et al., 1996). However, Souza et 
al. (2016) recommended the Schumacher-Hall and Spurr linear models to obtain stem volume 
estimates in typical Caatinga shrubby tree vegetation; and Lima et al. (2017) found that total 
volume estimates for Caatinga species in the state of Pernambuco are more accurate when 
dendrometric stem and branch variables are included in the model using the least squares method. 

The different results found reveal the typical volumetric variability of Brazilian dry tropical 
forests conditioned by variations in stem shape, branch shape, density and genetic characteristics 
(Lima et al., 2017). There is a clear need to develop techniques for obtaining adequate and accurate 
volume estimates from tree sample volume data. 

Herein, we analyzed a database of thinning trees from a forest management plan in the 
Contendas de Sincoré National Forest (FLONA), southwestern Bahia State, Brazil. Our data set 
included a total of 300 trees with trunk diameters ranging from 5 to 52 cm. We addressed the 
following questions: (i) what is the best volumetric model in predicting volume with bark? (ii) what 
is the best tree sampling strategy to produce models? and (iii) how does the number of trees used in 
fitting affect forecast errors with respect to the performance of the best locally derived model? 


2 Materials and methods 


2.1 Study area 


The study was conducted in an area of Caatinga forest (13°55’21"'S, 41°06'57"W) belonging to the 
FLONA, which has an area of about 1.10x10* hm? and is located in south of Chapada Diamantina, 
in the municipality of Contendas do Sincora, southwestern Bahia State, Brazil. The vegetation is 
classified as savanna-steppe forest of arid land late successional stage, since the last intervention 
record in the area dates from 1997 (Ministry of the Environment, 2006; Brazilian Institute of 
Geography and Statistics, 2012). 

The local climate is semi-arid (BSwh) according to the Köppen classification, with well-defined 
dry season and annual mean temperature of 23.0°C (Alvares et al. 2013). The annual precipitation 
is between 596.0 and 678.5 mm, with most distributed from November to April. The altitude range 
of the region is 295—380 m a.s.L, reaching 580 m a.s.l. in mountainous areas. The study area in the 
FLONA is located in the Resource Management Zone, determined by the Conservation Unit 
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Management Plan (Ministry of the Environment, 2006), which provides for and encourages 
research and management programs. 


2.2 Tree sampling and volume determination 


We randomly selected 48 plots of 20 mx20 m (400 m°) in order to understand the population 
diameter variation. We then measured the diameter at breast height (DBH) in each plot, taken at 1.3 
m from the ground for all tree individuals with DBH=5.0 cm. We measured the diameter of all the 
stems originating below 1.3 m, and used the quadratic diameter to obtain a single diameter per tree 
when a single tree had several stems (tillering). The stem DBH measurements were grouped into 
six diameter classes with an amplitude of eight centimeters, as shown in Table 1. 


Table 1 Diametric distribution of tree stems in the dry tropical forest 


DBH classes (cm) Class center (cm) Frequency 
5.0-13.0 6.50 1.2 
13.1-21.0 17.05 96.0 
21.1-29.0 25.05 25.0 
29.1-37.0 33.05 7.0 
37.1-45.0 41.05 4.0 
45.1-53.0 49.05 2.0 


Note: DBH, diameter at breast height. 


We rigorously measured the volume of 300 randomly selected trees distributed in different 
diameter classes proportionally to their frequencies (Table 1). We measured the diameters on each 
trunk with the bark at the 0.1, 0.3, 0.5, 0.7, 1.0, 1.3 and 2.0 m level positions from the soil using the 
Smalian method for calculating the volume. Then the sections were measured from this point at 
intervals of 1.0 m to the height where the diameter of three centimeters was, and then the length of 
the tip was measured. The total volume of each stem was obtained by summing the volume of all 
sections plus the tip volume, and then the total volume of each tree was obtained by the sum of the 
stem volumes. 

We considered the height of the largest stem as the total height for individuals with more than 
one stem, and obtained the diameter equivalent to the height of 1.3 m (DEq) from the DBH of the 
multiple stem of each tree. The DEq assumes that the cross-sectional area (m°) at 1.3 m of a 
multi-stem tree is equivalent to the sum of the individual cross-sectional areas of each stem and is 
defined by the root sum of squares of the stem DBH (Fraga et al., 2014). 


2.3 Fitting of volumetric models 


We tested two volumetric models based only on DBH and two combinations of DBH and height 
(H) commonly used to estimate trunk volume in the region: 


Husch model: VOL = £, x DBH* xe, (1) 
Hohenald-Krenn model: VOL = £, + £, x DBH + £, x DBR? +e, (2) 
Spurr model: VOL = £, + £, x(DBH’H) +e, (3) 
Schumacher-Hall model: In(VOL) = £, + £, x In(DBH) + 2, x In(H) +e, (4) 


where VOL (m°) is the volume with bark; £; is the parameter to be estimated; DBH (cm) is the 
diameter at breast height measured at 1.3 m from trunk length; H (m) is the total height; and ¢ is the 
random error. 

The parameters of the Hohenald-Krenn, Spurr and Schumacher-Hall models were estimated 
using the Ordinary Least Squares (OLS) method. Husch's non-linear model was adjusted by 
modifying the Levenberg-Marquardt algorithm using the "minipack.lm" package. The parameters 
were generally calculated using the total tree population sampled in each plot and were assumed to 
be true parameters representing trunk volume. 

The obtained equations were analyzed by comparing some statistical indices (Vanclay, 1994): 
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Akaike information criterion (AIC), adjusted determination coefficient (Ri), root mean square 
error (RMSE) and bias, with the formulas described below: 


AIC = -2LL + 2k, (5) 
2 _ p2 k-1 2 
Ra =R | x(1-R 1; (6) 
n Cay, 2 
RMSE = IRU ) (7) 
n 
Shaw 
Bias = S= /V, (8) 
n 


where LL is the likelihood-log; k is the number of model parameters; R? is the coefficient of 
determination; n is the number of trees measured; V; (m°) is the volume with individual bark i; V; 
(m°) is the estimated volume with bark of the tree i; V; (mô) is the arithmetic mean of volume 
with bark; and V is the mean volume. 

2.4 Minimum sample size for volumetric model development 


2.4.1 Random sampling 
We iteratively generated sub-samples from the total dataset and from trees with DBH greater than 
15.0 cm, randomly selecting a sub-samples number and increasing the total number of trees by 
20%, 50% and 70%. We followed the best selected model assembly procedure (as described in 
Section 2.3) for volume estimates for each set of randomly sampled trees. As the random sampling 
produces highly variable adjusted parameters, we iterated random sampling 1000 times for each 
sample size and calculated the mean parameter in 1000 iterations to produce a single average 
estimate of fo (the constant angular parameter) and fı and /2 (parameters) for each sample size, and 
to also generate the smallest error and bias measurements in the model calibration. 
2.4.2 Stratified random sampling 
We alternatively simulated a stratified random sampling of trees in six size classes (5.0-7.0, 7.0- 
9.0, 9.0-11.0, 11.0-13.0, 13.0-15.0 and >15.0 cm) for all trees with DBH>5.0 cm, and 15.0—20.0, 
20.0—25.0, 25.0—30.0 and >30.0 cm for trees with DBH>15.0 cm. Similarly, 20%, 50% and 70% of 
the trees within a given class were randomly chosen (Duncanson et al., 2015). The sample size was 
increased by selecting trees at random from both the total data and trees with DBH>15.0 cm. We 
also performed bootstrap in this procedure until we found the smallest measurements of error and 
bias in the model calibration. The best local model was developed for each subset and applied to 
the remaining data for cross validation. 

We used R statistical software (R Development Core Team, 2019) in all computations and 
analyses. 


3 Results 


3.1 Volumetric models 


Among the four adjusted volumetric models, the Spurr and Schumacher-Hall models present a 
better performance (smaller AIC; Table 2). These models suggest that the total variance explained 
in the estimates is not less than 95% (Ri), producing reliable predictions of the total volume with 
bark. Our estimates indicate that bias around the mean is not greater than 7% (bias=0.07; Table 2). 


Table 2 Estimates of parameters and adequacy indices of four volumetric models of individual trees adjusted for 
the dry tropical forest 


Model Bo (SE) Bi (48E) Bo (SE) AIC RSE R, RMSE Bias 


Husch —8.613 (+0.082) 2.303 (+0.036) - 141.45 0.30 0.93 0.06 0.10 
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Hohenald-Krenn —9.504 (40.346) 3.058 (40.286) —0.151 (40.057) 136.44 0.30 0.93 0.04 0.09 
Spurr —9.914 (40.089) 1.013 (40.013) - 58.88 0.27 0.95 0.06 0.07 
Schumacher-Hall —9.792 (40.140) 2.060 (40.040) 0.909 (+0.092) 59.60 0.27 0.95 0.06 0.07 


Note: f;, the parameter of the models adjusted with their respective confidence intervals obtained by standard error of the mean (P>0.05); 
SE, standard error; AIC, Akaike information criteria; RSE, residual standard error; RMSE, root mean square error; -, no data available. 


None of the adjusted models shows lack of fit according to the F-test. All selected models 
present parameters with valid confidence intervals (P>0.05). Although the volume estimate is 
similar for the range of trees up to 25.0 cm in diameter for all four models (Fig. 1), there is a 
noticeable divergence for the other diameter ranges. This can be caused mainly by the height data 
in which double entry models result in significantly lower percentage biases than models that do 
not include height. 


(a) Fitting (b) Validation 


Volume (m°) 


10 20 30 40 50 10 20 30 40 50 
DBH (cm) DBH (cm) 


—e— Husch model —e Hohenald-Krenn model —s— Spurr model —e— Schumacher-Hall model 


Fig. 1 Fitting (a) and validation (b) regression curves (color lines) and 95% CIs (CIs, confidence intervals; grey 
envelopes) of four models (detailed in Table 1) relating volume and diameter at breast height (DBH) of trees in the 
dry tropical forest 


Due to the difference in height values for the same diameter and the low correlation between 
both variables, we do not suggest models which only use the DBH variable as a predictor because 
they accommodate the largest estimation errors. In comparing the best single entry model 
(Hohenald-Krenn) with the Spurr model (best fit model), it is noted that the exclusion of height as a 
predictor causes the AIC of 136.44 and the Ri; of 0.93, which are poorer than the second best 
model (Schumacher-Hall). Therefore, our results suggest that it is more parsimonious to maintain a 
volumetric Spurr model obtained for the entire FLONA. 

The Spurr equation predicts that logarithmic transformation of the combination of tree diameter 
and height variables for a given volume decreases the bias in the estimate. The functional form of 
the equations tested is biologically consistent, especially with the inclusion of the height variable, 
and these results can be illustrated in the residual distribution (Fig. 2). 


3.2 Minimum sample size 


Regarding the minimum sample size, there are three important and visible trends shown in Table 3 
and Figure 3. First, errors in estimation (RMSE and bias) of the best model decrease as the sample 
size increases, especially when a larger number of trees with DBH>15.0 cm are randomly sampled. 
Second, stratified sampling by diameter class produces smaller volume prediction errors than 
random sampling, especially when considering all trees. This is most evident when we evaluate the 
minimized values of the differences between predicted and observed data. We further note that a 
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Fig.2 Residual distribution of wood volume for the best equation (Spurr) obtained for the trees in the dry tropical 
forest. (a), fitting; (b), validation. 


significant error reduction occurs up to a 50% sample size considering random sampling of both all 
trees and those with DBH>15.0 cm, while the size of up to 70% maintains a large mean difference 
for all trees considering stratified sampling (see bias in Table 3). Third, there is a considerable 
variability in volume estimates for the area, and this variability also decreases with increased 
sample size percentage. 


Table 3 Statistical criteria for different tree sampling strategies 


Sampling strategy RMSE RSE CV (%) Bias Percentile (%) 
All trees (stratified) 0.0559 0.2644 65 —0.0710 20 
All trees (stratified) 0.0549 0.2632 63 —0.2452 50 
All trees (stratified) 0.0541 0.2642 63 —0.0851 70 
DBH>15.0 cm (stratified) 0.1120 0.2681 38 0.4497 20 
DBH> 15.0 cm (stratified) 0.1066 0.2672 36 —0.4606 50 
DBH>15.0 cm (stratified) 0.1083 0.2715 37 —0.3180 70 
DBH>15.0 cm (random) 0.2400 0.2668 81 9.4526 20 
DBH>15.0 cm (random) 0.1373 0.2806 46 3.8118 50 
DBH> 15.0 cm (random) 0.1345 0.2804 45 3.5025 70 


Note: RMSE, root mean square error; RSE, residual standard error; CV, coefficient of variation. 
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Fig.3 Root mean square error (RMSE; a) and bias (b) of estimated total volume using different percentiles of data 
size to adjust and validate the best local model 
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4 Discussion 


Theoretically, regression analysis has been used with emphasis on solving most forest problems, 
especially when estimating forest parameters through biometric relationships (Robinson and 
Hamannn, 2011; Burkhart and Tomé, 2012; Lima et al., 2014). The use of regression models that 
can accurately determine forest production based on wood volume estimation is critical to 
implementing sustainable management (Berger et al., 2014; McRoberts and Westfall, 2014); this 
allows estimates of the monetary value of forests (Burkhart and Tomé, 2012). 

Logarithmic models have been constantly used in studying biometric relationships, mainly for 
developing biomass and volume equations in natural forests (Chave et al., 2005; Segura and 
Kanninen, 2005; Litton and Boone Kauffman, 2008; Basuki et al., 2009). These models are often 
generally used in Brazil and particularly in the different tropical dry forests (Lima et al., 2017), 
although few studies have applied the volume transformation from the logarithmic scale to the 
original scale using the correction factor (Vibrans et al., 2015). In addition, robust methods for 
leveraging model fit quality such as AIC or Bayesian information criteria are rarely used and 
should be incorporated into statistical model fit routines (Zeng and Tang, 2011). Our results 
support the decision to use regression methods to build models and estimate their parameters. 

While much of the variation in volume is explained by diameter alone, and the improvement is 
relatively significant when the height variable is included. Overall, all of the statistical evaluation 
criteria reveal that double-entry equations show a greater accuracy in predictions, especially the 
equation obtained from the Spurr model. Therefore, our results indicate the inclusion of the height 
variable in the volume estimate, since single entry models assume that trees of different diameter 
sizes have the same heights, which is not true for tropical forests (Brando, 2018; Sullivan et al., 
2018). 

However, it is important to ensure that the predominant tree growth forms and architecture are 
represented among the sampled trees used to develop the volumetric model (Duncanson et al., 
2015). The accuracy of volume predictions tends to be improved by including height as an 
explanatory variable in using appropriate tree sample and small measurement errors, despite the 
added uncertainty using a height and diameter model (Kachamba and Eid, 2016). 

Regarding the minimum sample size, we generally notice that stratified sampling presents a 
higher precision than random sampling, especially when considering trees with DBH>15.0 cm in 
the adjustment of the models. This is because random sampling distorts the sampling for trees with 
DBH=15.0 cm, causing larger errors in the simulations and allowing a less representative sample 
for smaller trees (Duncanson et al., 2015). 

Focusing on the results of stratified sampling, as these are probably more representative of forest 
measurement, it can be observed that the sample size of 70% of the total trees is in agreement with 
the average sampling obtained by Jenkins et al. (2003) and Duncanson et al. (2015). In stratified 
sampling, all trees and for trees with DBH2>15.0 cm are likely to have higher mean prediction 
errors because they simulate a low number of trees per diameter class, and volume estimation 
biases will increase with small sample size (Sullivan et al., 2018). The higher proportion of trees in 
the stratified sampling yields a more reliable volume estimate at the 70% level of data. Therefore, 
our results suggest that stratifying data across different DBH size classes is the most efficient way 
to develop a generic volumetric model. 

This work seeks to fill the gap about the validity of volumetric equations developed for dry 
tropical forests, although some research has already suggested the development of individual 
equations for species (Abreu et al., 2016). The issue currently being discussed and reported in this 
paper is whether it would be better to use generic volume equations or form factors from other 
locations, or to develop site-specific equations in locations where no generic volumetric equation is 
available. 

However, there is a lack of guidelines for selecting existing volumetric models and validating 
alternatives. The limiting factor has always been the destructive sampling of trees for model 
adjustment and selection (Chave et al., 2014). Highly accurate estimates of individual tree volumes 
and biomasses are increasingly available through Lidar technology (Watt et al., 2013; Levick et al., 
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2016; Oliveira et al., 2018). These estimates do not require destructive tree sampling and can be 
performed systematically in the field (Duncanson et al., 2015, 2017; Duncanson and Dubayah, 
2018). A system could be developed to adequately sample tree volume data in situ at environmental 
gradients with appropriate sampling, providing a potential solution to outstanding problems related 
to forest biomass and carbon stock. 


5 Conclusions 


Volume estimates with valid confidence intervals can be obtained by the Spurr model for the 
FLONA. Stratified sampling by diameter class for model adjustment and selection was proved to 
be necessary, since we found significant results with lower RMSE and bias values for up to 70% of 
the total database and also when only considering trees with DBH>15.0 cm. These results can be 
used by forest managers as a technical tool in predicting the volume of dry tropical forests in 
southwestern Bahia State, Brazil. Further studies should clarify the mechanisms for developing 
specific equations at the ecological group, family and commercial species levels. 
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